feat: add OpenAI diarization support by 8times4 · Pull Request #651 · TanStack/ai

8times4 · 2026-05-27T15:18:33Z

🎯 Changes

This change adds diarization support for OpenAI's gpt-4o-transcribe-diarize model, based on https://developers.openai.com/api/docs/guides/speech-to-text?lang=javascript

✅ Checklist

I have followed the steps in the Contributing guide.
I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

This change affects published code, and I have generated a changeset.
This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

New Features
- Added OpenAI speaker diarization support (gpt-4o-transcribe-diarize) for multi-speaker audio
- Added diarized_json response format with speaker-labeled segments
- Added configurable chunking strategy and diarization-related options
Documentation
- Updated transcription docs, adapter guides, examples, and best practices with diarization usage and constraints
Tests
- Added tests covering diarization requests, parsing/mapping, and validation rules

coderabbitai · 2026-05-27T15:26:08Z

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)

.changeset/openai-transcription-diarization.md
docs/adapters/openai.md
docs/comparison/vercel-ai-sdk.md
docs/media/generation-hooks.md
docs/media/transcription.md
docs/reference/interfaces/TranscriptionOptions.md

✅ Files skipped from review due to trivial changes (5)

.changeset/openai-transcription-diarization.md
docs/media/generation-hooks.md
docs/comparison/vercel-ai-sdk.md
docs/adapters/openai.md
docs/reference/interfaces/TranscriptionOptions.md

📝 Walkthrough

Walkthrough

Adds end-to-end speaker diarization for OpenAI transcription: new gpt-4o-transcribe-diarize handling, diarized_json support across types, adapter logic and validation, tests covering defaults and error cases, and documentation/changeset updates.

Changes

OpenAI Transcription Diarization Feature

Layer / File(s)	Summary
Response Format Type Contracts `packages/ai/src/types.ts`, `packages/ai/src/activities/generateTranscription/index.ts`, `packages/ai-client/src/generation-types.ts`, `packages/ai-openai/src/audio/transcription-provider-options.ts`, `docs/reference/interfaces/TranscriptionOptions.md`	`responseFormat` unions are extended to include `'diarized_json'`. `OpenAITranscriptionProviderOptions` adds optional `chunking_strategy` supporting `'auto'`, VAD config, or `null`.
OpenAI Adapter Diarization Implementation `packages/ai-openai/src/adapters/transcription.ts`	Adapter detects diarization-capable models, validates diarization options, maps requests to `diarized_json` when appropriate, auto-sets `chunking_strategy: 'auto'` for the diarize model by default, parses diarized segments into `TranscriptionSegment[]` with speaker/start/end/text, and preserves non-diarized paths.
Diarization Adapter Test Coverage `packages/ai-openai/tests/transcription-adapter.test.ts`	Vitest suite verifies default diarization wiring (`diarized_json`, `chunking_strategy: 'auto'`), explicit options forwarding (server VAD, known speakers), `chunking_strategy: null` passthrough, alternative response formats on diarize model, and validation error cases (unsupported options, speaker metadata limits/mismatch).
Documentation and Changeset `.changeset/openai-transcription-diarization.md`, `docs/media/transcription.md`, `docs/adapters/openai.md`, `docs/media/generation-hooks.md`, `docs/comparison/vercel-ai-sdk.md`, `docs/reference/interfaces/TranscriptionOptions.md`, `packages/ai/skills/ai-core/media-generation/SKILL.md`	Changeset and docs updated to document `gpt-4o-transcribe-diarize`, `diarized_json` format, `timestamp_granularities`, diarization `chunking_strategy` guidance, and updated Whisper examples using `responseFormat: 'verbose_json'`.

Sequence Diagram

sequenceDiagram
  participant Adapter as OpenAI Adapter
  participant Validator as validateDiarizationOptions
  participant Mapper as mapResponseFormat
  participant OpenAI as OpenAI API
  participant Parser as Diarized Parser

  Adapter->>Adapter: Identify diarization-capable model
  Adapter->>Validator: Validate diarization options
  Validator-->>Adapter: Constraints enforced
  Adapter->>Mapper: Map responseFormat
  Mapper-->>Adapter: diarized_json selected or mapped format
  Adapter->>OpenAI: Create transcription request (response_format, chunking_strategy)
  OpenAI-->>Adapter: Diarized or non-diarized response
  Adapter->>Parser: Map segments with speaker labels
  Parser-->>Adapter: TranscriptionSegment[]
  Adapter-->>Adapter: Return structured transcription result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

tombeckenham
jherr

Poem

🐰
Voices hop across the line,
Speakers sorted, timestamps fine,
Chunks arranged, each name defined,
JSON brings the chorus timed,
A rabbit cheers: "Diarize!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately summarizes the main feature added: OpenAI diarization support for the gpt-4o-transcribe-diarize model, which is the primary focus across all changes.
Description check	✅ Passed	The PR description follows the template structure, includes a clear explanation of changes with an OpenAI API reference, and all checklist items are properly completed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-openai/src/adapters/transcription.ts`:
- Around line 267-285: The diarization validation is missing a local guard for
responseFormat: update validateDiarizationOptions (used by transcribe and
guarded by isDiarizeTranscriptionModel) to throw when
modelOptions.responseFormat (or the mapped value from mapResponseFormat) is not
one of the allowed values ["json","text","diarized_json"]; ensure transcribe()
cannot send srt/vtt/verbose_json for diarize models by checking
modelOptions.responseFormat (or resolved response format) early and throwing a
clear error stating diarization models only support json, text, and
diarized_json; reference validateDiarizationOptions, transcribe,
mapResponseFormat, and isDiarizeTranscriptionModel when applying the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c4b4b31-fb90-4e00-9d8f-1454f513e089

📥 Commits

Reviewing files that changed from the base of the PR and between 5634f18 and a59d368.

📒 Files selected for processing (13)

.changeset/openai-transcription-diarization.md
docs/adapters/openai.md
docs/comparison/vercel-ai-sdk.md
docs/media/generation-hooks.md
docs/media/transcription.md
docs/reference/interfaces/TranscriptionOptions.md
packages/ai-client/src/generation-types.ts
packages/ai-openai/src/adapters/transcription.ts
packages/ai-openai/src/audio/transcription-provider-options.ts
packages/ai-openai/tests/transcription-adapter.test.ts
packages/ai/skills/ai-core/media-generation/SKILL.md
packages/ai/src/activities/generateTranscription/index.ts
packages/ai/src/types.ts

coderabbitai · 2026-05-28T11:11:06Z

Actionable comments posted: 0

nx-cloud · 2026-06-04T03:52:48Z

View your CI Pipeline Execution ↗ for commit fbb57a0

Command	Status	Duration	Result
`nx affected --targets=test:sherif,test:knip,tes...`	✅ Succeeded	4m 17s	View ↗
`nx run-many --targets=build --exclude=examples/...`	✅ Succeeded	1m 8s	View ↗

☁️ Nx Cloud last updated this comment at 2026-06-04 03:56:19 UTC

pkg-pr-new · 2026-06-04T03:53:28Z

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@651

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@651

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@651

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@651

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@651

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@651

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@651

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@651

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@651

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@651

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@651

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@651

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@651

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@651

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@651

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@651

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@651

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@651

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@651

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@651

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@651

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@651

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@651

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@651

@tanstack/ai-utils

npm i https://pkg.pr.new/@tanstack/ai-utils@651

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@651

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@651

@tanstack/openai-base

npm i https://pkg.pr.new/@tanstack/openai-base@651

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@651

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@651

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@651

commit: fbb57a0

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)

.changeset/openai-transcription-diarization.md
docs/adapters/openai.md
docs/comparison/vercel-ai-sdk.md
docs/media/generation-hooks.md
docs/media/transcription.md
docs/reference/interfaces/TranscriptionOptions.md

✅ Files skipped from review due to trivial changes (5)

.changeset/openai-transcription-diarization.md
docs/media/generation-hooks.md
docs/comparison/vercel-ai-sdk.md
docs/adapters/openai.md
docs/reference/interfaces/TranscriptionOptions.md

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)

.changeset/openai-transcription-diarization.md
docs/adapters/openai.md
docs/comparison/vercel-ai-sdk.md
docs/media/generation-hooks.md
docs/media/transcription.md
docs/reference/interfaces/TranscriptionOptions.md

✅ Files skipped from review due to trivial changes (5)

.changeset/openai-transcription-diarization.md
docs/media/generation-hooks.md
docs/comparison/vercel-ai-sdk.md
docs/adapters/openai.md
docs/reference/interfaces/TranscriptionOptions.md

🛑 Comments failed to post (1)

docs/media/transcription.md (1)
561-561: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the provider’s latest OpenAI transcription model in this example.

This changed snippet still hardcodes whisper-1; please update it to the latest OpenAI transcription model defined in the adapter model-meta.ts to keep docs aligned with project policy.

As per coding guidelines: “Use the latest model per provider in documentation example code, sourced from each adapter's model-meta.ts (newest gpt-*, claude-*, gemini-*, …)”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/media/transcription.md` at line 561, The example hardcodes 'whisper-1'
in the createOpenaiTranscription call; update the docs to use the provider's
latest transcription model constant exported from the OpenAI adapter's
model-meta.ts instead of a string literal. Import or reference the exported
latest-model symbol from that file (e.g., the adapter's LATEST_* or DEFAULT_*
transcription model constant) and pass that symbol into
createOpenaiTranscription so the docs always use the adapter-defined current
OpenAI transcription model.

tombeckenham · 2026-06-04T04:03:58Z

Hi @8times4, thank you for this. Would you be able to create an e2e test for this using aimock? The tests are in the e2e test package. Ideally, adding a way to see the results on one of the ts-react-chat example pages would be great as well

tombeckenham · 2026-06-04T04:19:35Z

Code review

Found 3 issues:

No E2E test coverage added for the diarization feature/behavior change (new gpt-4o-transcribe-diarize model, diarized_json responseFormat, speaker-labeled TranscriptionSegments, chunking_strategy + known_speaker_* options + validation). (CLAUDE.md says "Every feature, bug fix, or behavior change MUST include E2E test coverage." and "Add or update E2E tests — this is mandatory for any feature, bug fix, or behavior change"; see also the new-feature row in the E2E table and Pre-PR Quality Gate requiring pnpm --filter @tanstack/ai-e2e test:e2e. AGENTS.md and prior transcription PRs feat: extract @tanstack/openai-base and @tanstack/ai-utils packages #409/feat(ai-grok): audio, speech, and realtime adapters + example wiring #506 reviews establish the same convention: update feature-support.ts + test-matrix + fixture + spec.)

ai/packages/ai-openai/src/adapters/transcription.ts

Lines 140 to 150 in 05dfb53

    
               id: generateId(this.name), 
        
               model, 
        
               text: response.text, 
        
               duration: response.duration, 
        
               ...(segments.length > 0 && { segments }), 
        
             } 
        
           } 
        
           if (useVerbose) { 
        
             const response = (await this.client.audio.transcriptions.create({ 
        
               ...request,

responseFormat union literal duplicated (with added | 'diarized_json') across three locations instead of extracting a shared type. (CLAUDE.md says "Always look for repeated code or if the function you are trying to implement is already in another file" and "Review code at the end to see if you can make it more concise and less repetitive".)

ai/packages/ai/src/types.ts

Lines 1723 to 1732 in 05dfb53

    
             confidence?: number 
        
             /** Speaker identifier, if diarization is enabled */ 
        
             speaker?: string 
        
           } 
        
           /** 
        
            * A single word with timing information. 
        
            */ 
        
           export interface TranscriptionWord { 
        
             /** The transcribed word */

Validation guards in the newly added validateDiarizationOptions (and caller guard) are inconsistent with modelOptions conventions and incomplete: camelCase cast for responseFormat inside modelOptions (while spread + all other fields use snake_case response_format/chunking_strategy/known_speaker_*); prompt rejection and diarization-options guard only inspect top-level (not modelOptions paths); chunking_strategy diarize-only restriction does not check modelOptions?.chunking_strategy. This allows bypasses leading to late 400s instead of early errors. (CLAUDE.md says "Don't create fallback code. It hides problems. Just display errors to the user".)

ai/packages/ai-openai/src/adapters/transcription.ts

Lines 339 to 370 in 05dfb53

    
                 ) 
        
               } 
        
             } 
        
             protected mapResponseFormat( 
        
               format?: OpenAITranscriptionResponseFormat, 
        
             ): OpenAITranscriptionResponseFormat { 
        
               if (!format) return 'json' 
        
               return format 
        
             } 
        
           } 
        
           /** 
        
            * Creates an OpenAI transcription adapter with explicit API key. 
        
            * Type resolution happens here at the call site. 
        
            * 
        
            * @param model - The model name (e.g., 'whisper-1') 
        
            * @param apiKey - Your OpenAI API key 
        
            * @param config - Optional additional configuration 
        
            * @returns Configured OpenAI transcription adapter instance with resolved types 
        
            * 
        
            * @example 
        
            * ```typescript 
        
            * const adapter = createOpenaiTranscription('whisper-1', "sk-..."); 
        
            * 
        
            * const result = await generateTranscription({ 
        
            *   adapter, 
        
            *   audio: audioFile, 
        
            *   language: 'en' 
        
            * }); 
        
            * ``` 
        
            */

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread packages/ai-openai/src/adapters/transcription.ts

AlemTuzlak requested a review from tombeckenham June 3, 2026 14:45

8times4 added 2 commits June 4, 2026 13:39

add diarization support

c82735e

fix coderabbit recommendations

fbb57a0

tombeckenham force-pushed the feat/openai-transcription-diarization branch from 05dfb53 to fbb57a0 Compare June 4, 2026 03:47

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Uh oh!

Conversation

8times4 commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 Changes

✅ Checklist

🚀 Release Impact

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

nx-cloud Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tombeckenham commented Jun 4, 2026

Uh oh!

tombeckenham commented Jun 4, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

8times4 commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading

nx-cloud Bot commented Jun 4, 2026 •

edited

Loading